Here is a quick summary of our data. We see how many times values occured in our data frame as well as what were the quartiles values for our data.
summary(sales_final)
## sku customer_name ship_city ship_state
## FNG00015:10 CUSTOMER 1:58 TLAJOMULCO DE ZUNIGA:12 MX :26
## FNG00014: 8 CUSTOMER 3: 9 CUAUTITLAN IZCALLI :10 TB :18
## FNG00030: 7 CUSTOMER 6: 9 VILLAHERMOSA :10 JA :15
## FNG00031: 7 CUSTOMER 2: 2 CHALCO : 8 NL :14
## FNG00033: 7 CUSTOMER 4: 2 CULIACAN : 8 SI : 8
## FNG00043: 7 CUSTOMER 5: 2 MONTERREY : 8 CH : 1
## (Other) :37 (Other) : 1 (Other) :27 (Other): 1
## zip_code shipping_method package_24_shipment cases_sold
## 86280 :12 Delivery:83 Min. : 175 Min. : 252
## 45679 : 8 1st Qu.: 8898 1st Qu.: 12264
## 56640 : 8 Median : 20664 Median : 24840
## 66550 : 8 Mean : 58039 Mean : 47306
## 80130 : 8 3rd Qu.: 73386 3rd Qu.: 62118
## 54769 : 6 Max. :326400 Max. :213612
## (Other):33
## avg_spend
## Min. : 1.566
## 1st Qu.: 7.142
## Median :12.391
## Mean :10.950
## 3rd Qu.:15.707
## Max. :16.975
##
Now lets take a much closer look and see in which cities customers are buying these water cases. ***
ggplot(sales_final, aes(x = customer_name, fill = ship_city))+
geom_bar(color = 'black')+
labs(x = 'Customer', y= 'Where that customer is buying from', fill = 'City')
Here we show which city had the most sold cases. What was the breakdown of the shipments to each city. We can see which cities are the ones who are buying the most cases and which ones are buying the average amount of around 47306 cases. We can make cost efficient decisions in the next quarter so to maximize our orders with those cities that are buying the most cases.
#This is the total amount of cases that were shipped to each city
print(cities)
## # A tibble: 16 x 2
## ship_city total_sold
## <fct> <int>
## 1 CHALCO 410328
## 2 CHIHUAHUA 12096
## 3 CIENEGA DE FLORES 4032
## 4 CUAUTITLAN IZCALLI 930676
## 5 CULIACAN 541140
## 6 EL SALTO 12628
## 7 ESCOBEDO 197821
## 8 GOMEZ PALACIO 4032
## 9 GUADALUPE 60192
## 10 MONTERREY 670048
## 11 RAMOS ARIZPE 616
## 12 TEOLOYUCAN 76552
## 13 TLAJOMULCO DE ZUNIGA 501454
## 14 TULTITLAN 37392
## 15 VILLAHERMOSA 177761
## 16 VILLAHERMOSA TABASCO 289657
#Lets show the weight of total cases sold to each city
pie
ggplot(sales_final, aes(x = cases_sold, fill = ship_city))+
geom_histogram(color = 'black')+
labs(x = 'Cases Sold', fill = "City")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Here we see that I divided up the information and plotted it on a map of Mexico the busiest place that the cases were shipped for Q1 of 2017. This is good to visualize where we can focus more of the shipments in the future as to be sure to maximize shipments and profits.
ggplot(mexico, aes(x = long, y = lat)) +
geom_polygon(aes(group = group), fill = "white",
color = "gray40", size = .2) +
geom_point(data = sales_final2, aes(x = long, y = lat, size = cases_sold, color = ship_city))+
scale_size_continuous(range = c(8,15))+
labs(color = 'City of Destination', size = "Average cost of shipment", title ='Comparing the volume of shipment to each city in Mexico')
Here we have a boxplot that shows us the median price of shipping across all the cities in Q1. We see what the maximum value and minimum value were for shipping to that respective city. The average cost of shipment was $10.95.
ggplot(sales_final, aes(x = reorder(ship_city, avg_spend, median), y = avg_spend))+
geom_boxplot()+
theme_bw()+
theme(axis.text.x=element_text(angle=45,hjust=1))+
labs(x = "City", y = "Shipment Cost per Case(in Dollars)")
Here we are showing the average shipping price of shipping cases of water against how many cases where sold at that price point. We then want to see how many complete 24 packs of water of those cases sold compare against to the price of the shipment. Since different products have different bottle sizes and pack sizes, so “24-pack equivalent” cases are used as a scale for a comparison between different product types.
ggplot(sales_final, aes(x = avg_spend, y = cases_sold))+
geom_point(aes(size = package_24_shipment), alpha = 1/3)+
labs(x = 'freight cost', y = 'cases sold', size = '24-pack equivalents')
Now we want to smooth out this data to see what sort of relationships each state state has with regards to the cases sold, shipping cost, and 24 pack equivalents. We can determine what relationship our data has form this information. As cases start to get larger we see which states comsume the most of our product, while others start to fall off or comsume less. We can also see at what amount states start to increase their consumptions. This informations is valuable as it would lead to know how many cases to estimate to each state which can result in having a more efficient second quarter.
ggplot(sales_final, aes(x = avg_spend, y = cases_sold))+
geom_point(aes(size = package_24_shipment), alpha = 1/3)+
geom_smooth(aes(color = ship_state), se = FALSE) +
labs(x = 'freight cost', y = 'cases sold', size = '24-pack equivalents')
## `geom_smooth()` using method = 'loess'
With this data we can now make accurate estimations for quarter 2 of sales and freight. We can estimate how much it would cost to ship to these same locations in quarter 2 as well as know about how much each respective state and city will consume in the future. The data also tells us what the perfect amount of cases would be for each state as to now have them start to consume less product; therefore, making our shipment process much more efficient as we will know how much a specific state/city will expect to consume.